An Empirical Approach to Text Categorization Based on Term Weight Learning

نویسندگان

  • Fumiyo Fukumoto
  • Yoshimi Suzuki
چکیده

In this paper) we propose a method for text categorizaLion task using term weight learning. In our approach, learning is to learn true keywords from the error of clustering results. Parameters of term weighting are then estimated so as to maximize the true keywords and minimize the other words in the text. The characteristic of our approach is that the degree of context dependency is used in order to judge whether a word in a text is a true keyvv·ord or not. The experiments using Wall Street Journal corpus demonstrate the effectiveness of the method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Efficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text

People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...

متن کامل

Does a New Simple Gaussian Weighting Approach Perform Well in Text Categorization?

A new approach to the Text Categorization problem is here presented. It is called Gaussian Weighting and it is a supervised learning algorithm that, during the training phase, estimates two very simple and easily computable statistics which are: the Presence P, how much a term / is present in a category c\ the Expressiveness E, how much / is present outside c in the rest of the domain. Once the...

متن کامل

A Systematic Review of Banking Business Models with an Approach to Sustainable Development

Modern banks have shifted their function as purely administrative, economic and industrial entities into socio-political institutions that must be sensitive to the surrounding environment. This function has always been neglected. This study was conducted based on primary, secondary, and tertiary data and reviews the full text of 75 studies selected from more than 245 studies. The selected elect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998